RE: [GENERAL][SQL] 'denormalising' with a select - Mailing list pgsql-sql

From Stuart Rison
Subject RE: [GENERAL][SQL] 'denormalising' with a select
Date
Msg-id v04020a18b37af282ce20@[128.40.242.190]
Whole thread Raw
List pgsql-sql
A synopsis of 'the top-tastic answers' I got from Micheal and Herouth to my
posting (copied at the end of the e-mail).

1) by SQL

Herouth wrote:

>The question is, why would you want to do something like that? It doesn't
>add any more information than your basic table has. It is merely a
>different way of presenting the same information.
>
>Do you simply want to display it this way? What do you want to do with it
>on the frontend?

Very good point Herouth, I was indeed just trying to present the
information in a different format rather than actually requiring the codes
to be available as seperate fields in a new table.

>It's not impossible. You can do this with a five-way self-join and a lot of
> <> clauses in the where.

Out of curiosity, I tried it and, as Herouth points out, the 'general
formula' (here illustrated to get three codes into seperate fields) is:

SELECT DISTINCT t1.brecard_id, t1.code as code1, t2.code as code2, t3.code
as code3 -- etc.
FROM benign_pathologies t1, benign_pathologies t2, benign_pathologies t3 --etc.
WHERE t1.code<>t2.code
AND t2.code<>t3.code
AND t3.code<>t1.code
-- continue the inequalities for all possible pairs so that 3 tables yield
3 inequalities, 4 table yield 6 inequalities, 5 tables yield 10
inequalities etc.
AND t1.brecard_id=t2.brecard_id
AND t2.brecard_id=t3.brecard_id
AND t3.brecard_id=t1.brecard_id
-- continue all equijoins such that each table is equi-joined to the next
one except for the last table which is joined to the first table
;

>I don't think there is a way you can do the above in SQL without knowing
>the maximum number of codes in advance.

I'm pretty sure you are correct, certainly not a non-procedural way.

The other limitation with this SELECT is that it will ONLY select
brecard_id's which have X or more codes associated with it (where X is the
number of tables in your target list) and if there are more the X codes,
the 'extra' ones will be ignored (but not in a consistant fashion).

and of course, it is very (and exponentially) inefficient!

The other two suggestion were a procedural method (Michael) and using
aggregates (Michael and Herouth)

2) procedurally:

from Michael:

>I would write a function that returns a text containing a list of all codes
>with the brecard_id.

using his suggestion which needed only minimal changes, I got:

create function get_codes(bpchar) returns text as
'
declare
        rec record;
        rc text;
        sep text;
begin
        rc := '''';
        sep := '''';
        for rec in select distinct code from benign_pathologies where
brecard_id = $1 LOOP
                rc := (rc || sep) || rec.code;  -- parens are not needed in
version 6.5
                sep := '', '';
        end LOOP;
        return rc;
end;
'
language 'plpgsql';

followed by:

SELECT DISTINCT brecard_id, get_codes(brecard_id) from benign_pathologies;

Which I recon would not be too dificult to adapt to INSERT rec.code(s) as
seperate fields into a temp table.

3) using aggregates (and again very minimally adapted from Micheal):

CREATE FUNCTION get_codes_agf(bpchar, bpchar) returns bpchar as '
begin
    IF (length($1) > 0) THEN
        return ($1 || '', '') || $2;
        ELSE
        return $2;
        END IF;
END;'
language 'plpgsql';

CREATE AGGREGATE get_codes_ag (
    sfunc1 = get_codes_agf,
    basetype = bpchar,
    stype1 = bpchar,
    initcond1 = ''
);

I just could not believe how elegant this last solution was, of course,
because at no point in the definitions does a table name appear, the
solution is general too!

Finally, and for those of you who put up with all my woffle, is it the
'done thing' in these mailing lists to present a synopsis of answers to
your postings or did I just bore everyone and waste loads of bandwidth?

Regards,

Stuart.

>> > -----Original Message-----
>> > From:    Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk]
>> > Sent:    Tuesday, June 01, 1999 6:54 AM
>> > To:    pgsql-general@postgreSQL.org
>> > Subject:    [GENERAL][SQL] 'denormalising' with a select
>> >
>> > Hi there,
>> >
>> > This was posted to SQL where it 'truly' belongs but I got no answwer and
>> > since it has a bit of database design in it (and a lot more people seem
>> to
>> > read [GENERAL]) I thought I'd try it here.
>> >
>> > Consider a table like this:
>> >
>> > brecard_id      |code
>> > ----------------+----
>> > IEGA18051999006 |COME
>> > IPHA04031999004 |CRIB
>> > IPHA04031999005 |COME
>> > IPHA04031999005 |CRIB
>> > IPHA26021999006 |SOLI
>> > IPHA26021999010 |COME
>> > IPHA26021999010 |SOLI
>> > ISTL04031999001 |CRIB
>> > IUCH03031999003 |COME
>> > IUCH03031999003 |CRIB
>> > IUCH03031999003 |MICR
>> > IUCH03031999003 |SOLI
>> >
>> > each combination of id and code is unique (they form a composite primary
>> > key)
>> > but any brecard_id could have 1 or more codes associated with it
>> > (theoretically with no upper boundary but let us say a maximum of 5
>> > codes).
>> >
>> > Is there a SELECT which will turn each of the codes for one brecard_id
>> > into
>> > a column... ie.
>> >
>> > brecard_id      |code1|code2|code3|code4|code5
>> > ----------------+-----+-----+-----+-----+-----
>> > IEGA18051999006 |COME |     |     |     |
>> > IPHA04031999004 |CRIB |     |     |     |
>> > IPHA04031999005 |COME |CRIB |     |     |
>> > IPHA26021999006 |SOLI |     |     |     |
>> > IPHA26021999010 |COME |SOLI |     |     |
>> > ISTL04031999001 |CRIB |     |     |     |
>> > IUCH03031999003 |COME |CRIB |MICR |     |
>> > IUCH03031999003 |SOLI |     |     |     |
>> >
>> > and here a a few more brainteasers for you gurus out there...
>> >
>> > 1) I'm actually not fussed about the order the codes appear in the
>> > columns,
>> > but let's say the order mattered, would this affect the SELECT(s)?
>> > 2) Would it make the query easier if I knew the maximum number of codes
>> > one
>> > brecard_id could have?
>> > 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could
>> > you
>> > write a 'generalised' query which could cope with tables having variable
>> > 'maximum' numbers of codes associated with each brecard_id?
>> >
>> > For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL
>> > and
>> > have started playing around with that but I'd hate to re-invent the
>> wheel!
>> >
>> > regards,
>> >
>> > Stuart.
+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+

pgsql-sql by date:

Previous
From: Herouth Maoz
Date:
Subject: Re: [SQL] Order by
Next
From: "Pham, Thinh"
Date:
Subject: RE: [SQL] Column name's length